DENSITY APPROXIMATIONS FOR MULTIVARIATE AFFINE 
JUMP-DIFFUSION PROCESSES 



DAMIR FILIPOVICS EBERHARD MAYERHOFER^ AND PAUL SCHNEIDER^ 

Abstract. We introduce closed-form transition density expansions for multivariate affine 
jump-diffusion processes. The expansions rely on a general approximation theory which we 
develop in weighted Hilbert spaces for random variables which possess all polynomial moments. 
We establish parametric conditions which guarantee existence and differentiability of transition 
densities of afSne models and show how they naturally fit into the approximation framework. 
Empirical applications in credit risk, likelihood inference, and option pricing highlight the use- 
fulness of our expansions. The approximations are extremely fast to evaluate, and they perform 
very accurately and numerically stable. 



1. Introduction 

Most observed phenomena in financial markets are inherently multivariate: stochastic trends, 
stochastic volatility, and the leverage effect in equity markets are well-known examples. The 
theory of affine processes provides multivariate stochastic models with a well established theo- 
retical basis and sufficient degree of tractability to model such empirical attributes. They enjoy 
much attention and are widely used in practice and academia. Among their best-known propo- 
nents are Vasicek's interest rate model (Vasicck, 1977), the square-root model Cox ct al. (1985), 
Heston's model (cf. Heston, 1993), and affine term structure models (Duffic and Kan, 1996; 
Dai and Singleton, 2000; Collin-Dufresne et al., 2008). Affine models owe their popularity and 
their name to their key defining property: their characteristic function is of exponential affine 
form and can be computed by solving a system of generalized Riccati differential equations (cf. 
Duffic ct al. (2003)). This allows for computing transition densities and transition probabilities 
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by means of Fourier inversion (Dviffie ct al, 2000). Transition densities constitute the likelihood 
which is an ingredient for both frequentist and Bayesian econometric methodologies/ Also, they 
appear in the pricing of financial derivatives. However, Fourier inversion is a very delicate task. 
Complexity and numerical difficulties increase with the dimensionality of the process. Efficient 
density approximations avoiding the need for Fourier inversion are therefore desirable. 

This paper is concerned with directly approximating the transition density without resort- 
ing to Fourier inversion techniques. We pursue a polynomial expansion approach, an idea that 
has been proposed by Schoutcns (2000), Ai't-Sahalia (2002) and Hurn et al. (2008) among oth- 
ers for univariate diffusion processes. Extensions for multivariate (jump-) diffusions do exist in 
Ai't-Sahalia (2008) and Yu (2007), but they follow a different route by approximating the Kol- 
mogorov forward-, and backward partial differential equations. Our approach exploits a crucial 
property of affine processes. Under some technical conditions, conditional moments of all orders 
exist and are explicitly given in terms of derivatives of the affine characteristic exponential func- 
tion, see Duffic ct al. (2000). This ensures that the coefficients of the polynomial expansions 
can be computed without approximation error. 

We present a general theory of density approximations with several traits of the affine model 
class in mind. The assumptions made for the general theory are then justified by proving 
existence and differentiability of the true, unknown transition densities of affine models. These 
theoretical results, contrary to the density approximations themselves, do rely on Fourier theory. 
Specifically we investigate the asymptotic behavior of the characteristic function with novel ODE 
techniques. 

We improve earlier work, along several lines. Our method (i) is applicable to multivariate 
models; (ii) works equally well for reducible and irreducible processes in the sense of A'it-Sahalia 
(2008)^, in particular stochastic volatility models; (iii) produces density approximations the 
quality of which is independent of the time interval between observations; (iv) allows for expan- 
sions on the "correct" state space. That is, the support of the density approximation agrees 
with the support of the true, unknown transition density as in Hurn et al. (2008) and Schoutens 
(2000); (v) produces density approximations that integrate to unity by construction, hence are 
much more amenable to applications that demand the constant of proportionality than the 
purely polynomial expansions from Ai't-Sahalia (2008).^ A specialization on affine models is 
not a severe limitation, since virtually any continuous-time multivariate application is based 
on affine models."* This includes Wishart processes Bru (1991) and even general affine matrix- 
valued processes (Cucliicro ct al., 2010a). This paper therefore provides a unified framework for 



^Various other approaches for parameter estimation for discretely observed Markov processes can be found 
in the literature (excellent comprehensive surveys are for example in Hurn ct al., 2007; Sorcuscn, 2004; 
Ai't-Sahalia, 2007). The approaches range from likelihood approximation using Bayesian data augmentation 
(Roberts and Strainer, 2001; Elcriau ct al., 2001; Erakcr, 2001; Jones, 1998), estimating functions (Bibby ct al., 
2004), up to the efficient method of moment Gallant and Tauchcn (2009). Only few of them make use of the 
properties of afhne models, however (e.g. Singleton, 2001; Bates, 2006). 

model is said to be reducible Ait-Sahalia (2008) if its diffusion function can be transformed one-to-one into a 
constant) 

^The Markov chain Monte Carlo sampling schemes from Stranior et al. (2009) accommodate Bayesian likelihood- 
based inference using expansions from Ait-Salialia (2008) even in absence of the normalizing constant, but at a 
high computational cost. 

*In discrete-time, Lc ct al. (2010) show how Q-afEne models may be constructed to exhibit non-afiine dynamics 
under P. 
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econometric inference for financial models, because in applications one typically needs to eval- 
uate, both, the transition densities themselves, as well as integrals of payoff functions against 
the transition densities for model-based asset pricing. This complements the methods recently 
developed in Chen and Joslin (2011) and Kristcnscn and Mclc (2011), which are aimed at asset 
pricing only.^ 

The paper proceeds as follows: Section 2 develops a general theory of orthonormal polynomial 
density approximations in certain weighted C? spaces-under suitable integrability and regularity 
assumptions. These may be validated by the sufficient criteria presented subsequently in Sec- 
tion 3. The density approximations are then specialized within the context of affine processes: 
Section 4 reviews the affine transform formula and the polynomial moment formula for affine 
processes, which in turn allows the aforementioned polynomial approximations. The main the- 
oretical contribution-constituted by fairly general results on existence and differentiability of 
transition densities of affine processes- is elaborated in Section 4.3. In Section 5 we introduce 
candidate weight functions and the Gram-Schmidt algorithm to compute orthonormal polyno- 
mial bases corresponding to these weights, along with important examples. Section 6 relates 
existing techniques for density approximations to ours. An empirical study is presented in Sec- 
tion 7: applications in stochastic volatility (Section 7.2), credit risk (Section 7.3), likelihood 
inference (Section 7.4), and option pricing (Section 7.5), support the tractability and usefulness 
of the likelihood expansions. Section 8 concludes. The proofs of our main results are given in 
Appendices A-C. 

In the paper we will use the following notational conventions. The nonnegative integers 
are denoted by Nq. The length of a multi-index a = (ai,...,arf) G Nq is defined by |q| = 

ai H + "d, and we write = ■ ■ ■ C/ for any i G R*^. The de gree of a polynomial 

'pix) = X]|Q,|>oPa in X G M'^ is defined as degp(a::) = max{|a| | ^ 0}. For the likelihood 
ratio functions below we define 0/0 = 0. The class of p-times continuously differentiable (or 
continuous, if p = 0) functions on is denoted by C^. 

2. Density Approximations 
Let g denote a probability density on whose polynomial moments 

of every order a G Ng exist and are known in closed form. For example, g may denote the pricing 
density in a financial market model. Typically, g is not known in explicit form, and needs to be 
approximated. Let w be an auxiliary probability density function on W^. The aim is to expand 
the likelihood ratio g/w m. terms of orthonormal polynomials of w in order to get an explicit 
approximation for the unknown density function g. This can be formalized as follows. Define 
the weighted Hilbert space £^ as the set of (equivalence classes of) measurable functions / on 
with finite £^-norm defined by 

11/11^. = / i/(e)P^Orfe<oo. 

''It is of course conceivable to mix the mentioned methods. For example, one could use transition densities devel- 
oped in this paper, while approximating asset prices using the generalized Fourier transform in Chen and Joslin 
(2011), whenever the payoff function allows it, or the error expansion method from Kristcnscn and Mele (2011). 
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Accordingly, the scalar product on £^ is denoted by 

{f,h)cl= [ fiOHOHOd^. 

jRd 

We will now proceed under the following assumptions. Sufficient conditions for the assumptions 
to hold are provided in Section 3 below. 

Assumption 1. There exists an orthonormal basis of polynomials {Ha \ a G Nq} of £^ with 
degHa = \a\. This implies Hq = 1 in particular. 

Assumption 2. The likelihood ratio function g/w lies in This is equivalent to J^a d£, < 
oo. 



Consequently, the coefficients 



,Ha) = [ HaiO 9(0 d^ (= 1 for a = 0) 



g_ 

w 
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are well defined and given explicitly in terms of the coefficients of Ha and the polynomial 
moments Ha g. Moreover, according to standard £^-theory, the sequence of pseudo-likelihood 
ratios^ l+X]|L|=i ^aHa approximates the likelihood ratio g/w in £^ for J — oo. In fact, defining 
the pseudo-density functions^ 

(2.1) g'^'^\x) = w{x)ll+^CaHa{a 

\ l"l=i 

the following properties can be established. 
Theorem 2.1. The pseudo- density functions g^"'^ satisfy 

(2.2) / g(J)iC)dC=l 

(2.3) lim = — ^n £ 

J-s>oo w w 



(2.4) hm / g^'\i)-g{0 

J— !>00 



0. 



Property (2.2) proves to be very useful for applications where the constant of proportionality 
is needed, for example option pricing and the computation of Bayes factors. 

Proof. A calculation shows that 

/ Ha{i) w{i) di = {Ha, l)c2 = {Ha, Ho) ^2 = 0, 



'^This is an advantage over the method in Ait-Sahaha (2002) which also reUes on series expansions, where the 
coefficients are functions of expectations of nonlinear moments, and tlierefore have to be approximated in general. 

See Footnote 8 below for an explanation of this terminology. 
^Theorem 2.1 below states that integrates to one, but g^''^ may take negative values. Whence we shall call 
g^'^^ a pseudo-density function, and g^''^ /w a pseudo- likelihood ratio. 
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by the orthogonality of Ha and Hq = 1. Hence J^^d 9^'^\0 dS, = J^dw{S,)d(^ = 1, which proves 
(2.2). Properties (2.3) and (2.4) are formal restatements of the discussion preceding the theorem. 

□ 

The idea of expanding the likelihood ratio function g/w in orthonormal polynomials of w 
is simple and powerful. An overview and discussion of related literature can be found e.g. in 
Bernard (1995). In particular, for the case where w is the standard Gaussian density, (2.1) 
is actually the Gram-Charlier expansion of g. But note that Assumption 2 is very restrictive 
in this case. This is why the Gram-Charlier series diverges in most cases of interest, which is 
sometimes given as an argument against the use of it. However, the blame is on the choice of 
the Gaussian as auxiliary density. The efficiency of the approximation (2.3), or equivalently 
(2.4), lies in the appropriate choice of the auxiliary density function w and the corresponding 
orthonormal polynomials Ha- 

Here is a first result towards a good choice of w. The intuition is to choose w as close as 
possible to the unknown density function g, in the sense that the pseudo- likelihood ratio g/w is 
close to one. This should be achieved if many of the coefficients Cq, other than cq = 1, are equal 
to zero. This will also improve the numerical efficiency of the approximation as the respective 
orthonormal polynomials Ha need not be computed. Denote the polynomial moments of w by 

Xa= [ r«^(o«!e 

Lemma 2.2 (Moment Matching Principle). Suppose for some n > 1, we have fia = for all 
\a\ < n. Then = for 1 < |a| < n. 

Proof. The assumption implies that, for 1 < |a| < 

ca = / Haii) g{i) di= [ HaiO w{0 dC = {Ha, l)c2 = {Ha, Ho)c2 = 0, 
by the orthogonality of Ha and Hq = 1. □ 

3. Sufficient Conditions for Assumptions 1 and 2 

In this section we provide sufficient conditions for Assumptions 1 and 2 to hold. The proofs 
of the following lemmas are postponed to Appendix A. We first provide sufficient conditions on 
w that guarantee that Assumption 1 is satisfied. 

Lemma 3.1. Suppose that the density function w has a finite exponential moment 
(3.1) / e^«ll«l'u;(e)dC < oo 

for some eg > 0. Then the set of polynomials is dense in Moreover, Assumption 1 is 

satisfied. 

In applications, the auxiliary density function w on will often be given as product of 
marginal densities Wi on M. Hence the following modification of Lemma 3.1 will be useful. 

Lemma 3.2. Let wi, . . . ,Wd he density functions on M having finite exponential moments 

/ Wiiii) dii < oo 

Jr 
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for some > 0, i = I, . . . ,d. Then the product density w{^) = wi{S,i) ■ ■ ■ Wd{S,d) on M'^ admits 
a finite exponential moment (3.1) for eo = mirijei. Moreover, let {Hj \ j £ No} denote the 
corresponding orthonormal basis of polynomials o/£^,. (R) by degHj = j, for i = l,...,d, 
asserted by Lemma 3.1. Then 

defines an orthonormal basis of polynomials of C"^ with degH^ = \a\, and Assumption 1 is 
satisfied. 

Assumption 2 is opposite to Assumption 1 in the sense that there we have to bound the 
auxiliary density function w from below. The following lemmas provide sufficient conditions for 
Assumption 2 to hold. 

Lemma 3.3. Assume that g is bounded and has a finite exponential moment 

[ e^oll«ll5(Oc^^ < oo 
for some eo > 0. If w decays at most exponentially such that 

Q-eo\\x\\ 

(3.2) sup — < oo 

^gjjd w{x) 

then Assumption 2 is satisfied. 

If the support of w and g is contained in a subset T> of M'^, the situation becomes more difficult 
as one has to control the rate at which w converges to zero at the boundary of the support set. 
We provide sufficient conditions for the set T> = x M", starting with the scalar case T> = M_|.. 

Lemma 3.4. Let d = 1 and p G N. Assume that g is a bounded density with support in and 
has a finite exponential moment 

fOO 



poo 

/ e'°^g{Od^<oo 
Jo 



for some eo > 0. Assume further that g is of class . If w has support in and decays at 
most polynomially at zero and exponentially at infinity such that 

(3.3) sup — -— < oo and sup — -— < oo 

xe[o,i] w{x) w{x) 

then Assumption 2 is satisfied. 

The case where T> = x R" is similar, but requires stronger conditions on g and w. We 
respect the product structure of the domain by writing g = g{x, y) for x G R™ and y € M". The 
following tubular neighborhood of the boundary of T> 

I = \ (1, oo)" = {x£R"^\ mmxi < 1} 

i 

is the convenient multivariate generalization of the unit interval from the above scalar case. 
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Lemma 3.5. Let d = m + n and p G N. Assume that g{x, y) is a hounded density with support 
in X and has a finite exponential moment 

for some ei, €2 > 0. Assume further that g{x, y) is of class in x and the p-th partial derivative 
d§^g{x, y) is bounded on X x R", for all i = 1, . . . ,m. If w has support in M™ x M", and decays 
at most polynomially around the boundary and exponentially at infinity such that 

mini x^e~'^2llJ'll e~^ill^ll~^2||y|| 
(3.4) sup r < 00 and sup r < 00 

{x,y)eXxR" w[x,y) (x,y)e(i,oo)™'xR" w[x,y) 

then Assumption 2 is satisfied. 

We note that the conditions in Lemmas 3.3, 3.4 and 3.5 can be exphcitly verified for transition 
densities of affine processes, see Corohary 4.4 below. 



4. Affine Models 

The main application of the polynomial density approximation is for afhne factor models. 
In this section, we follow the setup of Duffic ct al. (2003), which we now briefly recap. Let 
d = m + n > 1. We define the index set J = {m + 1, . . . , fi}, and write vj = {vm+i, ■ ■ ■ , Vd) 
and mjj = {mki)^. i^j, for any vector v and matrix m. We consider an affine process X on the 
canonical state space T> = x M" with generator 

-^/(^) = E di^g (0' + E 0^ + {b + l3x)^ V/(x) 

k,l=l \ i=l / ^ '■ 

+ {fix + 0- fix) - Xj(e)^Vj/(x)) m{dO 

+ Jjf{x + 0-f{x)) (^^x,fi,{dO^ 

for some appropriate positive semidefinite n x n- and d x d-matrices a and a^, respectively. Here, 
with diag (0, a) we denote the block-diagonal d x d-matrix with blocks given by the m x m- 
zero matrix and a. Moreover, XjiO denotes an M'^-valued continuous and bounded truncation 
function with Xj(0 = & neighborhood of the origin ^ = 0. For detailed parametric 

restrictions on (a, a^, 6, /3, m, ^j) we refer the reader to Duf&e et al. (2003, Definition 2.6). We 
assume for simplicity'^ that the jump measures Hi are of finite variation type with integrable 
large jumps 

U\\ fJ-iidC) < 00, i = l,...,m. 



T 



At the cost of more technical analysis, the following results could also be proved for the general case of infinite 
variation jumps /Xi with infinite tail mean. 
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4.1. Affine Transform Formula. The analytical tractability of affine models stems from the 
fact that the characteristic function of Xt|Xo = x is explicitly given by the affine transform 
formula 

"3'"^^* I Xn = 



(4.1) E 



e 



where the C_- and C!!* x iR"-valued functions (p = (f>{t, iu) and ^ = ^(t, iu) solve the generalized 
Riccati equations, for i = 1, . . . , m, 

dtcP = ^P'jai^j + b^^P + j (e'/'^« - 1 - ^Jxj(e)) m{dO, 

m = 0, 

(4.2) dt^Pi = ^^a, ^P + Bi^P + (^e^^^ - l) fi^{dO, 

tpi{0) = iui, 
dtTpj = Bjjijjj, 
V'j(O) = i-uj, 

where we define B = (5^ and write Bi for the ith row vector of B. Obviously, we have 

V'j(i,in) = m'^'-'^uj, 
and (/>(t, iu) is given by simple integration of the right hand side of its equation. 

4.2. Polynomial Moments. It is well known that if Xt\XQ = x has finite k-th moment. 



E 



\Xtf \Xo = x 



< 00 for all X gD, 



then (j){t,u) and il){t,u) are of class in u. Moreover, the polynomial moments are explicitly 
given in terms of the respective mixed derivatives of the characteristic function 

E [x^ \Xq = x\ = -ii"i- — — ^mu)+^{t^uy 

for \a\ < k, see e.g. Duffie et al. (2003, Lemma A.l). It follows by inspection that the right 
hand side of this equation is a real polynomial in x of degree less than or equal to Re- 
cently, generalizing the recursive method used in Forinan and Sorcnscn (2008) for Pearson-type 
difi'usions, Cucliicro ct al. (20101:)) proposed an alternative method to compute the coefficients 
of this polynomial. The idea rests on the insight that the affine generator A formally maps Vk 
into Vk, where Vk denotes the finite-dimensional linear space of all polynomials in x G M*^ of 
degree less than or equal to k}^ The generator A thus restricts to a linear operator Ak on Vk- 
Consequently, we obtain the formal representation 

E [X^ \Xo = x] = e-^'=* x" 

where e'^''* = X^j>o ^"^jP is the exponential of Akt. This can be expressed as a matrix. We shall 
illustrate this for d = 1. The dimension of Vk then equals k + 1, and we can pick as canonical 



"'^'^This method is not restricted to affine processes, but can be defined for any Markov process with finite fc-th 
moments, and whose infinitesimal generator maps Vk into itself. 
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basis of Vk the set Q = {1, x, . . . , x'^}. For every j = 0, . . . , A; we then calculate symbolically the 
coefficients qij in 

k 

(4.3) AkX^ = Ax^ = ^ qijx\ 

i=0 

Hence Ak can be represented by the upper-triangular matrix Q = (qij) with respect to the basis 
Q. In other words, if we identify a generic polynomial p{x) = 'Yl!i=oPi^^ with the vector 

of its coefficients p = {po, . . . ,Pk)~^ , then Akp{x) € Vk equals the polynomial with coefficient 
vector Qp. Moreover, 

(4.4) E[p{Xt)\Xo = x] = e'^'p. 

See Examples 7.1 and 7.2 below for some concrete applications for d = 1 and d = 2. 

4.3. Existence and Properties of AfRne Transition Densities. In this section we present 
our main theoretical results, which establish existence and smoothness of the density of the con- 
ditional distribution of the affine process X. Moreover, we provide explicitly verifiable conditions 
asserting that Lemmas 3.3-3.5 apply. 

Our first result provides sufficient conditions for the existence and smoothness of a density 
of Xi|Xo = X, for some t > and x G x M". These easy to check conditions apply in 
particular to multi-factor affine term-structure models on x R" from Duffie and Kan (1996) 
and Dai and Singleton (2000), and Heston's stochastic volatility model. The proof is given in 
Appendix B. 

Theorem 4.1. Assume that the d x {n + l)d-matrix^^ 



(4.5) /C 



m 

E 

.i=l 



ai, diag (0, a) , diag (o, B'Jjoj , . . . , diag (o, {B'}/)^aj 



has full rank. Further, let p be a nonnegative integer with 

bi 

(4.6) p < min 1. 

ie{l,...,m} Oi^ii 

Then Xt\XQ = x admits a density g{S^) of class with support in x and the partial 
derivatives of g{^) of orders 0, . . . ,p tend to as \\^\\ — )■ oo. 

We note that condition (4.6) is sharp and cannot be relaxed in general. Consider for instance 
the scalar square-root diffusion X on M^. with generator Af{x) = axf"{x) + bf'{x). It is well 
known that for any parameter values a > and 6 > 0, the distribution of | Xq = a; is 
noncentral with ^ degrees of freedom and noncentrality parameter see e.g. Filipovic 
(2009, Exercise 10.9). The corresponding density function g{S^) satisfies lim^^o5(0 — 0) 
is therefore of class C'^, if and only if the degrees of freedom — > 2, see .Johnson et al. (1995, 
Chap. 29). This is exactly what condition (4.6) states for p = 0. 

As regards exponential moments of Xt\Xo = x, we combine and rephrase some results from 
Duffie et al. (2003): 



"'^"'^Here, for given d x d-matrices Bi, B2, ■ ■ ■ , B„ the expression [Bi, B2, . . . , Bn] denotes the d x nd-block matrix 
we obtain by putting the matrices next to each other. 
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Theorem 4.2. Assume that the jump measures admit exponential moments 

■''{ii€ii>i} h\m>i} 

for all q in some open neighborhood V of in Mf^. Then the right hand side of (4.2) is analytic 
in ^ £ V. Suppose further that (4.2) admits a V -valued solution ilj{t^ u) with ^'(O, u) = u for all 
t £ [0,T) and for all u in [— ei,ei]"^ x [— e2,e2]", for some ei,e2 > 0. Then Xt\XQ = x has a 
finite exponential moment 

(4.7) E Te'ill'^^^ll+'^ll^ill \Xo = x] < oo 

for all t G [0, T), where we denote Yt = {Xi^t, ■■■ , Xm,tV and Zt = {Xm.+i,t, ■ ■ ■ , Xd,tV ■ 

Proof. That the right hand side of (4.2) is analytic in iIj £ V follows from Duffic ct al. (2003, 
Lemma 5.3). From Duffie et al. (2003, Theorem 2.16 and Lemma 6.5) we then infer that 



E 



^0 



< OO 



for all t G [0,T) and for all q G [— ei,ei]™ x [— e2,e2]"- Combining this with the elementary 
inequality 

1 

Qei\\Yt\\+e2\\Zt\\ < geiE™i|X«t|+^2Et™+il^«t| < ^ g"?-^*, 

|a|=0 

where we denote g„ = ((-l)"iei, . . . , (-l)°'"ei, (-l)°'"+ie2, . . . , (-l)°'^e2)'^ G [-€1,61]"* x 
[-62,62]", proves (4.7). □ 

Note that if m{d^) and ^i{d^) have light tails of the order e^'^l'^l'^d^ for some r > 0, or 
have compact support in particular, then the first assumption of Theorem 4.2 is satisfied for 
y = M'^. Even then, however, the solution i/j{t,u) exists only on a finite time horizon t < 
T < 00 for any nonzero « G M*^ in general. We refer to the discussion of the diffusion case in 
Filipovic and Mayerhofer (2009), see also Filipovic (2009, Chapter 10). 

We further present an additional result which concerns the existence of the marginal transition 
density of integrated affine jump-diffusions, which are not covered by Theorem 4.1. In fact, if 
X is a one-dimensional affine process on R_|_, then the two-dimensional process {dX,X dt)~^ is 
affine again, with state space ffi^. However, its diffusion matrix is degenerate and thus violates 
the conditions of Theorem 4.1. Nevertheless, a slight adaption of its proof yields the existence 
of the marginal transition density of the integrated process J X dt under some more stringent 
conditions. The proof of the following theorem is given in Appendix C. 

Theorem 4.3. Let X be an E.+ -valued affine process with parameters (a = 0,a,b, I3,m, n). 
Further, let p be a nonnegative integer with 

(4.8) P<±-1. 

Then Xgds \ Xq = x admits a a density g{^) of class with support in and the partial 
derivatives of g{£,) of orders 0, . . . ,p tend to as ^ ^ 00. 

From the application point of view, we can rephrase the statements of the preceding theorems 
as follows: 
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Corollary 4.4. Theorems 4-l~4-3 provide conditions in terms of the parameters of the affine 
process X such that the assumptions in Lemmas 3.3-3.5, and thus eventually the validity of 
Assumptions 1 and 2, can explicitly be verified for the density of the (marginal) transition dis- 
tributions of X. 



5. Examples of Auxiliary Density Functions 

For our applications of the polynomial density approximations to affine models we shall make 
the following specific choices for the auxiliary density function w. For positive coordinates we 
use the Gamma density 

(5.1) 7(^;^) ^ 



T[l + D] 



of a r(l + D, l)-distributed random variable. Here, r[-] denotes the Gamma function. It is 
easily seen that conditions (3.1)-(3.4) are satisfied for the appropriate parameters p, eo and ei, 
respectively. 

For real-valued coordinates we employ the bilateral Gamma density from Kiichler and Tappe 
(2008a). The corresponding family of distributions nests, for example, the Variance Gamma 
distribution as a special case. It has very flexible shapes (Kiichler and Tappc, 20081:)). For the 
purposes of this paper we make use of a constrained, standardized version with mean centered 
at zero, unit variance, zero skewness, and excess kurtosis C > 0. We denote this standardized 
bilateral Gamma distribution with Ti,{C). Its characteristic function is given by 



1 / 1 



3/C 

a>r,(u;C) = 216^ ( ' ^ ^' ^ ^ 

By Kiichler and Tappe (2008b, eq. 3.6) the bilateral Gamma distribution has a density given 
by 

(5.2) 7.(e; C) = v'-.\VcJ ^ 

v^r(^) 

where Kn{£,) denotes the modified Bessel function of the second kind. It follows from 
Kiichler and Tappe (2008b, Section 6) that conditions (3.1)-(3.4) are satisfied for the appro- 
priate parameters eo and €2, respectively. The special case with excess kurtosis C = 1/3 leads 
to the following simple expression for the density 76 (0 = IbiC'jC = 1/3), since for half- integer 
indices the modified Bessel functions evaluate to elementary functions 

27p-3v^l5l , 

(5.3) 7b(e) = j= ■ 7776\/2|^|^ + 83160V2|C|^ + 180180a/2|^|^ 



1146880\/2 

-F75075\/2|^| + 1296^^ + 45360^^ + 207900^"^ + 210210^^ + 25025^ . 



Orthonormal polynomial bases can be constructed for any auxiliary density function w which 
has finite exponential moment (3.1) by the following Gram-Schmidt process, which is also used 
in the proof of Lemma 3.1. 



12 



DENSITY EXPANSIONS 



Algorithm 5.1 (Gram-Schmidt Process). 



Ho 

Ha 
Hn 



1 



0<|/3|<|a|,/3^a 
Ha 



HaiO 



(normaUzation) . 



£2 



Notice that degHa = deg^" = |a|, which is due to the linear independence of the set of 
monomials j.^" | a E Nq}. Below are the first five orthonormal polynomials for the Gamma and 
the bilateral Gamma densities 7 and 7^ introduced above. 

Example 5.1. The non-normalized orthogonal polynomials for the Gamma density 7 are the 
generalized Laguerre polynomials, the first five of which are 



Hj{i) 



1, 



-i + D + i 



1 



Hlii) = 7J (r - 2e(I) + 2)+D^ + ?,D + 2) 



(5.4) 



Hlii) 



-f + 3C^(L> + 3) - 3C (L>2 + 5D + 6) +D'^ + QD^ + IID + 6) 



- A{D + 2){D + 3){D + 4)^ + (D + 1)(D + 2){D + 3){D + 4))n. 



The normalization constants are given by 



(5.5) 



H02 



HliO 



Example 5.2. For the standardized bilateral Gamma density 7;, in (5.2), the first five non- 
normalized orthogonal polynomials are 



1 



(5.6) 



Hj{i) 
HT{i)=i 

-c-3)e + e^ 

2 (5C2 + 21C7 + 18) (C^-1) 



Hj'iO 



3(C + 2) 
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,76 



and the corresponding normalization constants HOn 



HOl' = 1 



are given by 



HOl" = 1 



(5-7) „ [tO^ 



ho:''' = a/ — + 9C7 + 6 



HOJ" 



12 {55C^ + 363C73 + 822C2 + 756C + 216) 



V 9(C + 2) 

Example 5.3 (Product Measure). Define the product density w-y-y^^ with support on x M by 

with Gamma density 7 defined in (5.1) and bilateral Gamma density 7;, defined in (5.2). Com- 
bining Lemma 3.2 and Examples 5.1 and 5.2, we obtain the corresponding orthonormal basis of 
polynomials {H2^ ■ HZ\ | (ni, 712) G N§}. 

6. Relation to Existing Approximations 

In this section we recall facts about closed-form density approximations from previous liter- 
ature and relate them to the density expansions of the present paper. A short summary of the 
capabilities and limitations of the different methods is reported in Table 1. 

The closest methodology to the one introduced in Section 2 Ai't-Sahalia (2002). One of 
the key steps is to transform the original process in such a way that a Gaussian-weighted jC^ 
expansion converges. This is motivated by an analogy to the central limit theorem (see also 
Ai't-Sahalia and Yu, 2005, Introduction): the sampling interval (time between observations) A 
plays the role of the sample size n in the central theorem; conditional on a correct standardization 
a iV(0, 1) density turns out to be the correct limiting distribution as n — ?• 00 (in the central limit 
theorem) and as A — t- (for the stochastic process). The correcting Hermite polynomials (the 
pseudo likelihood ratio) then account for the fact that A is not 0. Ai't-Sahalia (2002) applies 
two transformations. The first change of variables yields a unit diffusion process through the 
Lamperti transform. For univariate diffusions it can be shown that such a transformation always 
exists. This step introduces nonlinearities in the drift. The resulting process is then centered, 
and scaled in time. Consequently, a Gaussian- weighted C"^ expansion converges uniformly to 
the true, unknown transition density. This strong convergence result-which of course exceeds 
the mere C'^ convergence- is proved by using a representation of the true, unknown, transition 
density in terms of a Brownian bridge functional from Rogers (1985). Due to the nonlinearities 
in the drift, the coefficients of the A'l't-Sahalia (2002) Hermite expansion are generally not known 
in closed form, however. In practice they are approximated using a Taylor expansion in time in 
terms of the infinitesimal generator of the process. This is a key difference to the setting of the 
present paper, where expansions are constructed precisely such that their coefficients are linear 
in polynomial moments-and those are known for affine processes without approximation error. 

In the multivariate case, however, a Lamperti transform is rarely possible, since most appli- 
cations call for stochastic volatility models which are irreducible (A'i't-Sahalia, 2008, Proposition 
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Approximations 








AS02 


ASY05 


Y07 


AS08 


this paper 


multivariate 


No 


Yes 


Yes 


Yes 


Yes 


everywhere positive 


No 


No 


No 


Yes 


No 


integrates to one 


No 


No 


No 


No 


Yes 


jumps 


No 


Yes 


Yes 


No 


Yes 



Table 1. Comparison of Closed- Form Transition Density Approximations. 

AS02 refers to Ait-Salialia (2002), ASY05 to Ait-Salialia and \ u (2005), Y07 to \u (2007), 
and AS08 to Ait-Sahalia (2008). 



1). An entirely different strategy is therefore pursued in Ait-Sahalia (2008) for the irreducible 
multivariate case, where the log likelihood is expanded in, both, time, and space, so that the 
coefficients of the expansion may be computed from the Kolmogorov forward and backward 
equations. This approach is adopted by Yu (2007) with the difference that he also considers 
jump-diffusions and approximates the transition density itself, rather than the log transition 
density. 

The saddlepoint approach in Ait-Sahalia and Yu (2005) is fundamentally different. It (approx- 
imately) solves the Fourier inversion problem by expanding the cumulant generating function 
about the saddlepoint^^, rather than making use of the Kolmogorov forward and backward equa- 
tions. The maintained assumption here is that the cumulant generating function is available, 
even though for diffusions Ait-Sahalia and Yu (2005, Section 4) circumvent this problem by us- 
ing a Taylor series expansion for small times along the lines of Ai't-Sahalia (2002) for nonlinear 
moments. Though the saddlepoint approach and this paper both facilitate expansion techniques, 
the objects of the expansion are different and the formulae are unrelated. Saddlepoint approx- 
imations are extremely accurate even for low orders (Ait-Sahalia and Yu, 2005, Fig. 2). The 
price to be paid for this precision is the computational burden of having to solve numerically 
for the saddlepoint for every pair of forward and backward variables. 

7. Applications 

In the following we present applications which highlight the usefulness of the transition density 
approximations developed in this paper. For the empirical investigations considered below we 
find that there is a trade-off between numerical accuracy and the order of the expansion. Higher- 
order expansions may perform worse than low-order expansions due to numerical errors that are 
induced by the limited numeric precision of the computer environment in representing very 
large or very small numbers. As a general guideline we suggest matching as many moments 
(cumulants) as possible when choosing weights, and stopping the expansion at a relatively 
low order such as J = 4. For the present section we adopt notation used conventionally in 
finance and econometrics. In particular we deviate from Duffic ct al. (2003) notation. The time 
interval between between observations is generally denoted by A. 



For a stochastic process X denote by K{t, u \ xq) = log the cumulant generating function. 

Suppose Xt\Xo — xo has an absolutely continuous law. For any state x the saddlepoint is defined as the solution 
u = u{t, x, xo) in u to the implicit equation duK{t, u \ xq) = x. 
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We strongly recommend checking the above theoretical foundations for the validation of As- 
sumptions 1 and 2 in numerical applications, as outlined in Example 7.1 below. 

7.1. Basic AfRne Jump-Diffusion (BAJD). We first consider a square-root process with 
exponentially distributed jumps. This process has been recently used in papers that study 
portfolio credit risk (Duffic and Garlcanu, 2001; Mortcnscn, 2006; Eckncr, 2009; Fcldhiittcr, 
2008) where it is termed basic affine jump-diffusion (BAJD), and in a bivariate form in single- 
name credit (Schneider et al., 2010). It can be described in SDK form 

(7.1) dYt = {kO - KYt)dt + ay^tdWt + dKt. 

The intensity of the compound Poisson process K is I > 0, and the expected jump size of the 
exponentially distributed jumps is > 0. The set of parameters we denote by qy = {n9, a, u, 1} 
and the domain of the process V = M+. By Theorem 4.1, 2k9 > a"^ ensures existence of transition 
densities. 

Example 7.1 (Developing an iZ^ Expansion for the BAJD). To exemplify the necessary steps 
to develop a density expansion, we consider here an explicit example and compute an order 
J = 4 density expansion for the BAJD process from eq. (7.1). The weight we use here is a 
Gamma distribution r(l -|- D, 1) with density function 7 from eq. (5.1). 
Step 1. Computing Conditional Moments: The generator of the BAJD is 

Af{x) = {kO - ^x)^^ + + / ^ {f{x + e) - f{x))^e~idi. 

Hence the matrix Q = {qij) from (4.3) relative to the canonical basis {l 4} equals 



/ 


kO + Iv 


2/1/2 


6/zv3 


24/1/4 


\ 





— K 


0-2 + 2k9 + 2lv 


6/1/2 


24/1/3 










-2k 30-2 _ 


f SkO + 


12/1/2 













— 2>K 


60-2 + Ak6 4 


-4/1/ 


V 











-4k 


/ 



Note the upper-triangular form. A symbolic mathematics software package such as Mathematica 
or Maple will be able to compute the matrix exponential e'^^ in closed form. The conditional 
moments ^nivo-, ^1 Qy) = E [Y^ \Yq = y^, qy] may then be obtained by plugging into formula 
(4.4). We obtain the conditional moments as polynomials of order < 4 in the backward variable 
2/0. Below we will suppress dependence of the moments on yoi Qy to lighten notation. 
Step 2. Scaling the Process and Computing the Coefficients: Having computed the first four 
conditional moments, we now introduce a scaled process Fa = and set D = ii\/{n2 — 

/if ) - 1. Note that 

1E[Fa I = yo, qy] = V[Fa I Yo = yo, Qy] = D + 1. 

Hence, in view of Lemma 2.2 we match the first two moments of Ya with the ones of the 
standardized gamma density w = 7(^;D) from eq. (5.1), since expectation and variance of 
7(.^; D) equal D + 1 as well. 

By using the corresponding orthogonal polynomials H2 from (5.4) along with their normal- 
ization constant HOZ from eq. (5.5) we obtain the coefficients of the density approximation 
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(2.1): for each n > we have 



E 

(7.2) Cnivo, A, qy) = — 



H2iYA) I Yo = yo, QY 



HOI 

In particular, the first five coefficients are of the following explicit form: 

co(yo, A, t»y) = 1 
ci(yo, A, gy) = 

C2(yo, A, £>y) = 



C3(yo, A, ^y) 
C4(2/o, A, qy] 



{D + l)({D + 2){D + 3) - i2±2p^ 
V 

y/QyJ{D + l){D + 2){D + 2,) 
{D + 1)V4 + {D + A){D + 1)^1 (3(£> + 2){D + 3)/if - A{D + 1)^3) 
2^/Q^{D + 1)(L» + 2){D + 3)iD + 4)/if 



Note that due to the chosen scaling, the deforming polynomial (the pseudo likelihood ratio) does 
not contribute to the density approximation for the first two orders as predicted by Lemma 2.2. 
Step 3. Verification of Assumption 2: We denote by g and g the density of Ya and Y^, respec- 
tively. The existence of g and therefore of g is ensured by requiring 

2 

k6i > y > 0, 

by Theorem 4.1. By the same result, g and g are of class for the greatest nonnegative integer 
p satisfying 

, , 2k9 
7.3 p< — -l. 

On the other hand, using Theorem 4.2 one can verify numerically, by solving the corresponding 
Riccati differential equations, that 

(7.4) E[e^^] = E[e n < 00. 

This implies finite polynomial moments of g and g, and therefore justifies the calculations 
in Steps 1 and 2, in particular. Note that the Gamma density w{^) = 7(^;D) satisfies 
su]3^^^Qi'^x^ /w{x) < 00 and sup^yie~^ /w{x) < 00. In view of (7.4), Lemma 3.4 implies 
validity of Assumption 2, that is g/w G once D < 2p. By (7.3), the latter holds if and only 

ifl3 

\D/2] < ^ - 1, 
which again can easily be checked numerically. 



[a;] denotes the smallest integer which is greater than or equal to x. 



DENSITY EXPANSIONS 



17 



Step 4. Putting Everything Together: Accounting for the change of variable y{y) = — the 
density proxy equals 

4 

(7.5) g'^^ {y \ yo, A, gy) = 7{y{y)) ■ c,{yo, A, ^>y)i/7(y(y)) • — 

Figure 3 shows how the polynomials Ci{yQ, A, qy)HJ {y{y)) in the pseudo likelihood ratio deform 
the auxiliary density u; = 7 into the right shape. 

7.2. Heston's Model. The Hcston (1993) stochastic variance model has been particularly used 
for the pricing of equity (index) options. The model for the log stock price X and its stochastic 
variance V can be realized as solution of the following SDE 

dVt = {kOv - KvVt) dt + a^/vidW^ 

(7 6) 

dXt = {kOx - \vt)dt + y/Vt (^pdWY + Vl - P^dWi^') , 

with (Vl^^, Vl^"'^) being a two-dimensional standard Brownian motion. The domain V of the 
process equals M4. x M. With 2k9v > and |p| < 1 Theorem 4.1 guarantees existence of 
transition densities. Note that it would be perfectly possible to enrich Heston's model above 
with jumps in both factors (this has been done for example in Duffic ct al. (2000), Erakcr ct al. 
(2003), and Eraker (2004)), to multiple variance factors, or even a matrix-valued variance process 
as in (Da Fonscca ct al., 2008). 

The correlation parameter p above is a device to model the leverage effect which is partly 
responsible for the skew in option prices. Figure 1 displays, using an order 4 expansion, how 
the skew of the density may be altered by decreasing the p parameter. 

For the bivariate Heston model, to compute conditional moments up to order two using 
formula (4.4), the canonical basis is given by {l, v, x, v'^,vx, x^} and the corresponding Q matrix, 
analogous to (4.3), is 



Q = 





k6v 













\ 





— KV 


1 
2 


+ 2Kev 


k9x + per 


1 
















nOv 















—2kv 


1 
2 



















— Ky 


-1 




Vo 

















/ 



Example 7.2 (Standardizing and Scaling the Heston Model). Our goal is to work within a 
space weighted with a product measure w : x M 1— ]R_|_ 

w{i, r])d{i, 1]) = -f{^)d^ ■ lb{j])dri, 

composed of T[D + 1, 1) and T}j{C) densities, from definitions (5.1) and (5.2), respectively. In 
this space, we may use the polynomials from eqs. (5.4) and (5.6). Lemma 3.2 ensures that 
the product of the polynomials forms an ONB in the £^ space. Acknowledging Lemma 2.2 we 
want to make sure that we match as many moments as possible to optimize the quality of the 
approximation. Below we show how this transformation ^ is composed. 
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X 

Figure 1. The Effect of Leverage: The figure shows the efi'ect on skewness of neg- 
ative correlation between the log stock price and its stochastic variance. Depicted are 
approximated transition densities g^l^ of the Heston model for different values of the cor- 
relation parameter p for fixed v — 0.043. The parameters that generated the picture were 
A = 1/52, kOv = 0.04, Kv = 1, o- = 0.2, nOx = 0.03, xq = 5,vo = 0.04. 

We introduce lighter notation by defining Ut = {Vt,Xt) and for the first two moments of 

{Vt,Xt) 



1 1 1 1 1 


p = 




p= 0.2 




p = -0.4 " 


/ 

/ 


\ p= 0.6 


/' 

/ 

/' 


p= 0.8 - 

\ 

■■■.,\ 

\ V\\ 


if''// 

h'j'f 
■■■■i'/ / 
//// 


\ \\\ 
\ \ '\\\ 



E [Ut \Uo = u, qvx] 



and 



V [Ut \Uo = u, Qvx 

For demeaning and block-diagonalizing define 

= Tin + vi 

where 

T - M 

-b/ai 1 

Then cri(f/t) has first two moments of the form 

E[<^i(C/t) I Uo = u,Qvx] - ' 

V[ft(^7j) I Uo = u,Qvx] 



( 


^1] 






ai 


b 


b 


0-2 



Vl 





6^i/ai - ^2 





ai 

-b'^/ai + a2 
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The next transformation scales the process into the optimal form (according to Lemma 2.2) 

?2(u) = T2 • n, 

where 

^ / m/ai 
' VO l/v/-ftV«i+«2 
Then ^2 ° ^i(Ut) has first two moments of the form 





Y[,2O,i{Ut)\Uo = U,0vx]=(^'^'^''' J 



and choosing D = Hi/ai — 1 the bivariate orthogonal expansion of the density of ?2 o ^2{Ut) may 
be performed in terms of the polynomials introduced in (5.4) and (5.6). By the transformation 
the polynomial moments up to second order induced by w agree with the moments of <j2 o(;i(^Ut) 
and the moment-matching Lemma 2.2 applies up to order 2. We have used 

? : u T2 o (Tiu - vi), 

and its inverse is 

The parameter C in (5.2) is set to the exact excess kurtosis of the transformed log stock process 
and the expansion may be performed analogously to Example 7.1. 

7.3. CDO Pricing. In the reduced- form credit risk framework (Lando, 1998), we model the 
stochastic default intensity A of a corporation with a positive process such as (7.1). Under the 
pricing measure Q the default time r of a corporation is then taken to be the first jump of an 
inhomogeneous Poisson process with intensity A. More formally we write the survival probability 
of a corporation (using the short-hand notation [•] = E [• | J^t]) 



<[T>T\Tt] = l{r>t}^t 



All expectations are with respect to the risk-neutral pricing measure Q. For the pricing of port- 
folio credit derivatives, to introduce dependence between different obligors, Duffie and Garlcaiiu 
(2001) (and subsequently Mortensen (2006), Eckncr (2009), and Feldhiitter (2008)) introduce a 
factor intensity model 

(7.7) Ait = Xit + a,Yt, 

where Xn is a firm-specific (idiosyncratic) intensity factor, and is a (systemic) factor common 
to all obligors i = 1, . . . ,n. We model both X and Y with independent jump-diffusion processes 
from eq. (7.1). For n obligors we must impose flj = 1 to ensure identifiability (see Eckner 
(2009)). 

The survival probability of obligor i according to model (7.7) is then due to independence of 
the factors 



(7.8) Q[Ti>T\ Tt] = l{.,>t}Et 



Et 



-ai J J Yu du 
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Defining Zt^x = /j Ysds we may write the default probability of the obligor conditional 



on Zt^T as 

(7.9) qi{Zt,T) =Qt[t<ri<T\ Zt,T] = \r^>t} (l - E 



^—aiZt^T 



and denoting with Pij^^k \ Zi^t) the conditional probability that k of the first n credits in the 
portfolio default between t and T the recursive algorithm of Andersen et al. (2003) then develops 
the number k of defaults conditional on Zt^x as 

,7.10, <'(^l^'^ri-'<-». 

P/!P+')(A: I Zt,T) = q.m+iiZt,T)Pi7ik " 1 I Zt,T) + (1 - g™+i(Zt,T))P/? (^ I ^t,T), 



for < k < n and < ?ti < n. The expressions 



- Xiu du 



but computing the unconditional default probability 



1, . . . , n are unproblematic, 



t,T) 



involves an integration against the density of Zf^T- We can get hold of the distribution of Zt^T 
by investigating the joint evolution of Y from eq. (7.1) and the integral over Y. We therefore 
embed Y into the two-dimensional affine process (Y, Z) described by 

dY = (kO - KYt)dt + ay^tdWt + dKt 
dZt = Ytdt. 

Note that even though the instantaneous covariance matrix of the process (7.11) above is only 
of rank one, this process is a well-defined affine process in the sense of Dufiic ct al. (2003) as 
pointed out also in Section 4.3. Existence of the marginal transition density of Zt^x \ Yt is shown 
in Theorem 4.3 for k6 > cj^. 

In principle the conditional default probabilities from eq. (7.10) may be computed using the 
moment generating function of Zt^T- In real world applications n is typically larger than 100, 
however, and the expressions become intractably large, even for small k. In practice, recursion 
(7.10) is therefore computed through numerical integration. A test of our density expansion in 
this setting may therefore be reduced to the question of how well we can approximate the true 
moment generating function. Below we outline how this approximation can be done in closed 
form. 

Denote by Ej"^^ [/(Z^^t)] the expectation of /{Zt^r) with respect to a J-order expansion instead 
of the true density. Considering the functional form of the expansion (2.1), to approximate the 
expressions Et [e~^*'^'^*] , i = 1, . . . , n we note that we need to perform the computation 

(7.12) eS^) [e«^.T] = / e^^w{i)Y,c,H,{i)di 

J 

(7.13) =Y.'^hU) / e<ew{OdC, 
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le-07 



-le-07 



-10 



Order 2 
Order 10 






a 



10 



Figure 2. True vs. Approximated Moment Generating Function: The figure 
shows the log difference between the true moment generating function Et [e"^*'^l and the 



approximated moment generating function E' 



computed for an order 2 and an 



order 10 expansion of the integrated BAJD from (7.11). The parameters that generated 
the picture were a = l,T - t = 5,Ke = 0.00150602, «: = 0.4648,0" = 0.01,/ = l,u = 
0.0002,7/0 = + Results are computed using Mathematica and the picture is 

generated with a numeric precision of 20 digits. 



where cuij) is implicitly defined as^^ 

i=o i=o 

The chosen weight w for the approximating transition density is a Gamma distribution. To 
compute eq. (7.13) for a random variable Z that is Gamma distributed Z ~ T{a,6) we note 
that 

E e^^Z'^ = i -P- , n E N, a G M, 

'- r(a) 

where F denotes the Gamma function. 

Figure 2 shows that for the order 10 expansion the approximation error is numerically zero. 
The order 2 expansion also works well, with negligible numeric error. 

7.4. Likelihood-based Inference. In this section we investigate the performance of the poly- 
nomial density expansions in likelihood-based inference. For discrete, equally spaced (with time 
interval A) observations [Xq,Xi, . . . ,Xn) = X of a Markov process {Xt)t>o,Xo=xo with domain 



"'^^In practice the coefficients may be coUected using a symbolic mathematics package such as Mathematica or 
Maple. 
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(a) Order 10 Expansion 
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(b) Order 2 Expansion 

Figure 3. Density Plots of the integrated BAJD: The figure shows the density 
of ^o,A I yo,QY from specification (7.11). The parameters generating this density are: 
A = 5, = 0.00150602, k = 0.4648, a = 0.01, 1 = l,u = 0.0002, yo = (k(9 + lu) /k. The 
right y axis shows the deviation error to the true density (obtained by Fourier inversion) 
in percentage terms. 

V and parameters gx we may write the likelihood function Ix ■ ^ Qx ^ ^+ as 

N 

(7.14) lx{X I Qx,XQ) = J{gx{Xi \ Xi_i,Qx,^). 

i=l 
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Denote by 

N 

(7.15) 1^^\X I Qx,Xo) = n^x^l^i I X,_i,gx,A) 

1=1 

the approximate likelihood function using a J order expansion developed in this paper. The 
maximum likelihood estimator qx is obtained as the global maximizer of the likelihood (7.14) 

N 

(7.16) Qx = argmaxTT5x(^i I Xi^i,Qx,A). 

ex -^-^ 

1=1 

Similarly, for a maximizer obtained from the approximate likelihood we write 

N 

(7.17) Q-x^ = arg max TT (Xj | Xi^i,Qx,A). 

sx -^-^ 

1=1 

The Bayesian framework (cf. Robert, 1994, for reference and comparison to other methodologies) 
views the parameters themselves as random variables and is aimed at the posterior density 

/ I Y\ ^x{X I Qx) , X 

(7.18) J l[X I Q)7T{Q)dQ 

OC IxiX I Qx)TTiQx)- 

The prior density it : Qx ^ ^+ expresses the econometrician's personal beliefs and knowledge. 
Its specification may be fueled by economic intuition, for example that nominal interest rates 
should be positive, and also parameter constraints. Note that the expression (7.18) invokes 
Bayes theorem and therefore demands that Ix and vr actually are densities in that they are non- 
negative functions on the domain of the random variable and integrate to one. This requirement 
has been challenged to a great extent. The most common violation stems from expressing 
uninformedness by setting the prior for a parameter q^ with positive domain proportional to a 
constant 

(7.19) 7r(^+) OC — . 

Prior specifications such as the one mentioned above are called improper priors, because their 
integral does not exist. A less common violation arises from the use of closed- form likelihood 
expansions within Bayesian inference for Markov processes. For the univariate likelihood ex- 
pansions from Ai't-Sahalia (2002) the normalization constant may be evaluated through numeric 
integration, putting a heavy computational burden on the econometrician. For the multivariate 
expansions for irreducible models from Ai't-Salialia (2008) the normalization constant does not 
even exist, because the expansions are purely polynomial. In contrast, the expansions devel- 
oped in the present paper integrate to one by construction. They share with the expansions 
from Ai't-Sahalia (2002) the unpleasant feature that they may become negative, however, even 
though experience shows that this happens very rarely. For the empirical studies in this paper, 
for instance, it has not happened even once. 



"'^^See Di Pietro (2001) for an introduction to the problem and Stramer ct al. (2009) for MCMC algorithms to 
overcome it in a very general context. 
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Subsequently we will denote posteriors where the likelihood is approximated using the ap- 
proximate likelihood Ij^^ from (7.15) by 

P^'HQx\X)=l^^\x\0x)7riQx). 

To test both methodologies we generate realizations from models (7.1) and (7.6) through exact 
simulation methods. We then perform both frequentist and Bayesian inference using our density 
approximations and the true density (obtained through Fourier inversion of the characteristic 
function). Frequentist inference is performed on 1,000 data sets generated by model (7.6), to 
acquire information about the sampling distribution of the (approximate) maximum likelihood 
estimators. Bayesian inference is performed on one data set, for the BAJD (eq. (7.1)) and 
Heston's model (eq. (7.6)), respectively. We then compare the posterior distribution originating 
from true density to the posterior distribution generated by the density approximations from 
this paper. 

The simulation for each data set is started from the unconditional mean and then propagated 
forward 600 data points. We discard the first 100 observations to eliminate impact of the initial 
condition. To investigate the behavior of our density expansions for different time horizons 
we choose a monthly observation frequency for the square-root jump-diffusion (7.1) and weekly 
observation frequency for the Heston model (7.6). 

To obtain exact draws from the BAJD we generate exact draws from Yi \ using 
Robert and CascUa (2004, Lemma 2.4). For a uniform random variable U ~ (0, 1) we ex- 
ploit that Gy^{U I yi_i,gy) ~ Gy for any distribution function Gy. We simulate from (7.1) 
using the parameters k6 = 0.04, k = l,a = 0.2, 1 = 3,^ = 0.01. 

Algorithm 7.1 (Exact draws from BAJD process (7.1)). We perform the following procedure 
starting from Yq = E, [Yt], the unconditional mean, for a realization Yi \ 

(i) Draw U ~ (0, 1). Call the realization Uj 

(ii) Use the Newton-Raphson algorithm to compute y : Gy{Y < y \ Qy) = Ui. In this 
step we substitute y = + c to keep y on the positive domain. The floor parameter 
c we set to 10~^ to avoid numerical difficulties. The iteration is then 



Wj+i = Wj - e^""^ (e""^ + 1 



^Gy{Y <c+^ij^\Yi.,,gY)-Ui 
9y{c+-4j^ I Yi_i,gY) 



starting from wq = log ( c-y^ i+i ) " ^^OP iteration at 



w 



Gy{Y < c + ^ I Yi_i, qy) - Ui 

6 ~n 



< £. 



Both, Qy, and, Gy are obtained through Fourier inversion. In our implementation the 
algorithm terminates after 5 to 6 iterations for e = 10~®. 
(iii) Set Yi = c + 'JirTpi increment i and go back to step (1) 

For Bayesian inference we specify an uninformative prior 

1 



(7.20) ti{qy) = 1 



{2k9>(t2,o->0,«>0,!^>0} 



a ■ k9 ■ I ■ V 
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The Heston parameters are Ky = 1,k6v = 0.04,0" = 0.2, k9x = 0.03, p = —0.8. To obtain 
exact draws from this model we refer the reader to the algorithm in Broadie and Kaya (2006). 
For Bayesian inference we specify the prior distribution as 

(7-21) t^{qvx) = 1{2k0v'><t2 -i<p<i,fT>o}^— 

To evaluate the transition density we employ the formulation from Lamoureux and Paseka 
(2005), which may be evaluated using a single numerical integral, instead of the two-dimensional 
Fourier integral. This reduction of dimensionality comes at the price of having to evaluate 
complex- valued special functions, however. 

With 1,000 datasets of weekly realizations from the Heston model, for each dataset we obtain 
parameters Qy-^ by maximizing the log likelihood (7.14), respectively the approximate log likeli- 
hood (7.15). We use the optimizer donlp2 to achieve this task. To relate the density expansions 
of this paper to existing approximations we perform the estimation experiment with 

• the true density (obtained through Fourier inversion) denoted by MLE 

• order 4 likelihood expansions developed in this paper using a product measure with a 
Gamma weight for the variance process and for the log stock variable a 

— bilateral Gamma weight. Specifically we employ formulation (5.3). Estimates are 
denoted by 5G(4) 

— Gaussian weight. Estimates are denoted by G(4) 

• order 2 closed-form likelihood expansions from Ai't-Sahalia (2008) denoted by CF{2) 

• Gaussian approximation using true conditional moments up to order 2 denoted by QML 

Table 3 reveals that the true likelihood function exhibits problematic behavior for some pa- 
rameterizations. Only 688 out of 1,000 estimates turned out to be successful. This is due to 
numerical integration problems that occur in particular for low values of a that arise in the like- 
lihood search. Density expansions developed in this paper are also not entirely unproblematic. 
Numerical errors from evaluating the pseudo likelihood ratio accumulate and induce spikes that 
irritate the optimizer's numerical differentiation routines. The Hermite polynomials used for 
G(4) appear better behaved than the polynomials associated with the bilateral Gamma density 
used in BG{A). 

Table 3a reports bias and RMSE of the estimators. The large bias of 0.2255 for the n param- 
eter is a well-established phenomenon that has also been reported in Ai't-Sahalia and Kimnicl 
(2007). As an overall impression the results suggest that the density approximations developed 
in this paper exhibit parameter estimates with properties similar to the true ML estimates, 
while Ai't-Sahalia (2008) expansions interestingly exhibit lower bias, with the exception of the a 
parameter, but higher RMSE. In Table 3b the first column (Mean) in 'oyx'^ — Qy^^ indicates 
mean deviation from the true ML estimator and the second column (SD) captures statistical 
noise in the estimation. Estimation bias around the MLE for all estimators appears very small. 
Except for the CF{2) estimator, the noise induced through the density approximations is smaller 
than the estimation noise of the true MLE. Surprisingly, the QML estimator, a special case of 
the approximations developed in this paper since it is an order two expansion around a Gaussian, 
performs remarkably well. All around the BG(4) expansions appear to be the preferable choice. 
In particular a*^'~'^^^ — 3=**^^^ and — p^'^^LE pQjj^^ ^j^g right direction, the estimators 

are closer to the true parameters than MLE. 
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The results of the Bayesian inference study also appear promising. Inspecting Figures 5 we see 
that an order 2 expansion already delivers reasonable results, while the order 4 expansion seems 
to be even closer to the posterior density obtained from the true density function. To assess 
how close the posteriors Pyx a-nd Py^x densities are to the posterior obtained through the true 
transition density pvx we compute Kolmogorov-Smirnov tests. The results can be seen in Table 
2. They suggest that while Py\ appears to be quite different from pvx-, Pyx is statistically 
almost indistinguishable from the true posterior pvx for the majority of the parameters. 

7.5. Option Pricing. Heston's model (7.6) is used for option pricing because it may be con- 
sistent with the implied volatility skew that can be inferred from market prices. As such it is 
much more compatible with real data than say, the Black-Scholes model. In stock (index) option 
pricing the quantity of interest are marginal transition probabilities of the log stock price X. 
We therefore engineer an approximation directly around the marginal density of \ Xq, Vq by 
expanding gx in C"^^. We set the constant C from (5.2) to the excess kurtosis of X. 

Recall that the price of a European call option with maturity A and strike price K is given 

by 

C(A, K) = e-'^^E [(e^^ - A')+ \Xo = x, = v, qvx 

/ \ 

(7.22) coo fOO 

= 6"''^ / e^gx{C\x,v,Qvx,^)d(,-K gx{i\x,v, qvx, ^)di 

JlogK JlogK 

' V ' ' V ' 

\ HA{/^,K) HB{A,K) j 

In accordance with the previous sections we will denote C('^)(A, AT), and similarly if^("')(A, K) 
and HB^'^\A,K), the option price computed with g^^^ instead of gx- Denoting Q{X < x) the 
transition probability (and accordingly Q^"^) (X < x) the J order approximation of the transition 
probability) we have that HB^-^\A,K) = 1 - Q(-'^)(X < log AT), and using the standardization 
from Example 7.2 and the change of variables formula 

(A, K) = l- g^i'^ (e|x, V, Qvx)d^ 

(-) ^i-^/':%.(^) (:.|;.,.,....,.?(i^)).e 

= 1 ^ V Tfe — , ^ Mx, V, QVX)- 

Here, Hj'' are from eqs. (5.6) and (5.7) and i9i{x,v, gvx) are implicitly defined as 

(i+Yl c^(^, V, Qvx)H]' f ^T^) ) = E ^*(^' evx)e- 



^=l V . ^ 



The function Tfy{K, n) = S.^'lbiO^S, is explicit in terms of the Gamma function and regular- 
ized generalized hypergeometric functions. The constituent HA^'^^ of the approximate call price 
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Figure 4. Closed-form option pricing in Heston's model Panel 4a shows implied 
Black-Scholes volatility of the true option price, here computed using the Carr and Madan 
(1999) dampened Fourier inversion approach and the C'*' (A, K) option pricing formula 
from the approximation of eq. (7.22) as a function of strike K. The second panel 4b 
shows the density function of \ Xq, Vo to indicate the likelihood-moneyness trade off. 
The parameters for the model (7.6) behind the pictures are A = 1/52, k6v ~ 0.04, kv = 
1, (T = 0.2, kOx = 0.03, p = -0.8, Xo = 5.1, Vo = 0.04. 



from eq. (7.22) can be computed by numeric integration. Figure 4 shows that option pricing 
performance is very good. 

Remark 7.2. Collecting coefficients to compute from Section 7.5 eq. (7.23) by hand is very 
error-prone. Instead we recommend using a symbolic mathematics software package such as 
Mathematica or Maple. 

8. Discussion 

This paper develops a general framework for density approximations for affine processes us- 
ing orthonormal polynomial expansions in well-chosen weighted spaces. We provide novel 
existence and smoothness results for their transition densities in particular. 

The approximations are designed to exploit the explicit polynomial moments of affine pro- 
cesses to compute the coefficients of the expansion without approximation error and in closed 
form; the computational burden is concentrated only in the initial calculation of the coeffi- 
cients of the expansions. Once they are implemented, evaluation is rapid, avoiding the heavy 
computational cost of Fourier methods to obtain transition densities. Empirical applications in 
credit risk, likelihood-based parameter inference, and option pricing suggest that the density 
expansions are very accurate. 

The paper leaves a number of open points for future research. The first question concerns 
approximations in higher-order weighted Sobolev spaces. One might suspect that approximation 
of (sufficiently smooth) densities in weighted higher-order Sobolev spaces are superior to 
expansions. In particular, it could be expected that (i) the quality of approximation might be 
better (ii) Sobolev embedding theorems could be applied to infer global uniform convergence. 
However, quite contrary to the case, it is unknown whether the space of polynomials is dense 
in weighted Sobolev spaces. Higher-order Sobolev spaces also impose heavy restrictions on the 
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functional form of the approximation weights, which in turn lead to very slow convergence rates. 
Indeed, preliminary numerical experiments suggest that the price for global, uniform convergence 
which potentially comes with higher-order Sobolev spaces is a very slow convergence rate. 

Another route worth pursuing is a compact truncation of the state space, such that approx- 
imations could be performed in non-weighted Sobolev spaces, for which there is more theory 
available in the literature. 

Suitable approximation weights (such as the bilateral gamma weight of this paper) are a 
research topic of its own, and they lead to non-trivial problems in the theory of special functions. 
Also, density expansions for processes on state spaces different from the canonical ones would 
be highly desirable. As an example we mention the class of matrix-valued processes used in co- 
volatility modeling (Lcippold and Trojani, 2008; Da Fonseca et al., 2008; Buraschi ct al., 2008). 



Appendix A. Proofs for Section 3 
This appendix gathers the proofs of the lemmas in Section 3. 

Proof of Lemma 3.1. That the set of polynomials is dense in £^ is shown in Bernard (1995, 
Lemma 1). The assumption made in Bernard (1995) that w is strictly positive can easily be 
omitted by replacing point-wise equality "= 0" by "= i(;(^) d^-a.s." at the end of the proof 
of Bernard (1995, Lemma 1). An orthonormal basis of polynomials {H^^ \ a S Nq} of £^ with 
degi^Q = |ct| is obtained by applying the Gram-Schmidt process to the linearly independent set 
of monomials {^°^ \ a G Nq}, see Algorithm 5.1. □ 

Proof of Lemma 3.2. That the product density w on has finite exponential moment (3.1) 
follows from the elementary inequality ||^|| < X]i=i ^r all ^ G W^. The orthonormality of 
Ha follows from the easily verifiable relationship 

d 

Moreover, every monomial = • • • can be written as a product of linear combinations 
of the respective orthonormal polynomials 

j=0 

It follows that the set {Hq. \ a G Nq} is dense in and hence forms an orthonormal basis of 
Cl. This proves the lemma. □ 

Proof of Lemma 3. 3. The lemma follows from the estimate 

^d^= I 5(e) ^-^e^«ii 5(0 rfe 

I „ — enllxll \ I- 

e'""^" g{i)di < oo. 

□ 
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Proof of Lemma 3.4- A Taylor expansion of g{x) around gives g{x) = ^^j^x^ for some ^ = 
^(x) G [0,x]. Since d^g is continuous we conclude that there exists some finite constant K such 
that g{x) < K x'P for all x G [0, 1]. We then obtain 

Jo Jo w{0 Ji w{0 

1-1 t2p / p-<^ox\ roo 

<K^ S^d^ + snpi g{x) — - / e^««5(6 < oo. 

Jo x>i V ^i^) J Ji 

This proves the lemma. □ 

Proof of Lemma 3.5. Let x £ I, and let i* be such that Xj* = minjXj. A Taylor expansion 
of g{x,y) in xi* around x-i* = gives g{x,y) = — Xi*^ for some ^ = (,{x,y) G X. Since 

dxig{x,y) is bounded on X x R" we conclude that there exists some finite constant K such that 
gi^j y) ^ K miuj x^ for all (x, y) G X x M". We then decompose 

with 



s«/ / ""°:i:,;""" ^-*"^K.''>'g-*' 

min,- X? e"*^^"^" 



<ii: sup ^^ ^ / / e'^™ g{i,r])didr]<oo 

(x,j/)eXxR" V w{x,y) 



and 



/ / 



— — — d^fir/ 

^illClh^alhll 



(l,oo) 

< sup \g{x,y)- — 1 / / e^ill4ll+^^l"'llg(4,??)ded^? 

(x,j/)e(l,oo)'"xR" y 'W[X,y) y J(l,oo)'" JR" 

< oo. 

This proves the lemma. □ 



Appendix B. Proof of Theorem 4.1 

First we note that the existence and smoothness properties of a density on M'^ for Xt\Xo = x 
is invariant with respect to non-singular linear transformations of the state vector Xf. In view 
of Filipovic (2009, Theorem 10.7) there exists a non-singular linear transformation of the state 
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(B.l) a^ 



vector Xt mapping P = M™ x R" onto itself, and which renders block diagonal matrices aj in 
the form 

diag(0, . . . ,0,ai,ii,0, . . . ,0) 
ai^jj 

so that ai x = ai^uxf + x~jaijj xj for all x G M'^. Moreover, this transformation does not 
affect the upper diagonal element ai^a (see the proof of Fili];)ovic (2009, Lemma 10.5)), which is 
important in view of the criterion (4.6). Hence without loss of generality we shall from now on 
assume that the matrices are of the block diagonal form (B.l). 

We now recall a classical result on characteristic functions P of probability measures ly, see 
Sato (1999, Proposition 28.1): 

Lemma B.l. Let u be a probability measure on W^. Assume its characteristic function D{m) = 
e'""''^ i^((i^) satisfies 

\i'{m)\ \\u\\^ du < oo 



for some nonnegative integer k. Then v has a density h(x) of class and the partial derivatives 
of h{x) of orders 0, . . . ,k tend to as \\x\\ — )■ oo. 

It thus remains to prove the appropriate integrability of the affine characteristic function 
(4.1), that is, the appropriate tail behavior in n G M*^ of the functions (j){t,\u) and %l:[t,m). 
The following lemma is our core result, which together with Lemma B.l completes the proof of 
Theorem 4.1. 

Lemma B.2. The following properties are equivalent: 

(i) The d X (n + \)d-matrix fC given in (4.5) has full rank. 

(ii) For any t > 0, the d x d-matrix 

A{t)= / diag(0,e^^J^ae^^^")(is + t Voi 

is nonsingular. 

(iii) For any t > there exists an e > such that the cones 

Co = lueM.'^ \u] ( [ e'^Jj'ae'^-''''ds] uj > e\\uf 







|ii G M'^ I n~''ai n > e||n||^| , i = l,...,m 



cover . 



. That is, \JILo Ci 



Moreover, any of the above properties, (i), (ii), or (iii), implies that 



(B.2) 



for all nonnegative numbers p < minjgji^ „j| — 1. 

The remainder of this section is devoted to the proof of Lemma B.2. Let t > and u G M'^\{0}. 
We first claim that u^/C = if and only if A{t) = 0, which proves equivalence of (i) and (ii). 
To prove the claim, note that since A{t) is positive semidefinite, vJ A{t) = is equivalent to 



u\r du < oo 
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A(t)u = 0. Since each of the summands in A(t) is positive semidefinite, this again is equivalent 

to 



^ = and Uje^Jj" a = for all s G [0, t\. 

i=l 

A power series expansion of e^^J* shows that this is equivalent to 

m 

^ = and ul{B)j)^a = for ah k G Nq. 



The Cayley-Hamilton theorem (see (Horn and Johnson, 1990, Theorem 2.4.2)) implies that, for 
all k > n, Bjj is a linear combination of Id, Bjj, . . . , BjJ^. Whence the above property is 
equivalent to vJ }C = 0, which proves the claim. 

The equivalence of (ii) and (iii) follows from the identity 

u^A{t) u = uj ( [ e^^ii" a e^-^-^** ds) uj + tY] ai u 

and the fact that each of the summands is nonnegative. This establishes the first part of 
Lemma B.2. 

As for the second part of Lemma B.2, we note that as a consequence of (B.l) the real and 
imaginary parts 

f{t,iu) = ^tpitjiu) and g(t,m) = Qilj{t,m) 
of "0 satisfy the following system of Riccati equations, for i = 1, . . . ,m: 

dth = ai,iiff - g'ai g + B,f + j (e^^« cos [g" i) - l) ^l,{di) 
MO) = 



dtgi = '^fi ai,ii gi+Big + j e^^^ sin (5"^?) fJ-i{dO 



gi{0) = Ui 
gj = e^-'^'uj 

In the sequel we will make use, without further notice, of the fact that fi is ]R„-valued for all 
i = 1, . . . ,m, and that fj = for all j £ J. In particular, it follows from above that fi satisfies 
the following system of differential inequalities 

dtfi < ai^ii fi - g~^ai g + Bufi, i = l,...,m. 

For any u 0, we now define the scaled functions 

F{t,u) = -l-f(-L-,m 

\\u\\ \ \\u\\ 

B.3 " " ;" " 

I ( t 
G{t,u) = ——g -rT—r7,m 

\\u\\ V \\u\\ 
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Then F and G satisfy, for i = 1, . . . , m: 

dtFi < ai^ii Ff-G^aiG + -^^Bu Fi 

\\u\\ 

Fi{0) = 
Fj = 

dtGi = 2Fi ai^ii Gi + -^{B,+ /C,) G 



.u\\ 

Ui 



\U\\ 



Gi{0) 

\\u\\ 

Gj = e^^^'PL 
\\u\\ 

where we define the d x d-matrix IC = IC{t,u) by its 1 x d-row vectors 

[O, i = m + I, . . . ,d, 

and we have used the simple fact that 

ell"ll^^« sin {\\u\\G'^^) = el!"ll^^« J^' ^ sin [s\\u\\G'^ ^) ds 

IG^^e^M^^^ j\os ds. 

It fohows by the assumptions on fii that the matrix IC is uniformly bounded 

sup \\IC{t, u)\\ = K < oo 

t,u 

where the constant K only depends on the measures fii. The squared norm of G thus satisfies 

dt\\Gf = 2G'^dtG = 4G^diag(F) G + G^ (B + IC) G 

\\u\\ 

<-^{\\B\\ + K)\\Gf 
\\u\\ 

||G(0)f = 1. 

We shall now and in the sequel make use of the following comparison result, which is a special 
case of a more general theorem proved by Volkmann (1972): 

Lemma B.3. Let R{t,v) be a continuous real map on M+ x M and locally Lipschitz continuous 
inv. Let p(t) and q{t) be differentiable functions satisfying 

±pit)<Rit,p{t)) 

j^qit)=R{t,q{t)) 
p{0) < qiO). 

Then we have p(t) < q{t) for all t > 0. 
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Applying Lemma B.3 to the above differential inequality for ||G|| we obtain 
(B.5) IIGf < eW(ll«ll+^^)*. 

For any i S {1, . . . , m} we then obtain the differential inequality 

dt (c^aiG^ = 2G^aidtG 

= AG'^ai diag(F) G + ^ G^ ai{B + IC)G 

> Aai,ii Fi ai,ii Gj - + k) 

> 4«,. G^a. G-'-ff{\m+K) e wdl^H^^)* 

\\u\\ 

G(0)Ta,G(0) = ^f^ 

IfII 

where we have used the fact that Ui^a Gf < G~^ ai G and (B.5) for the last inequality. Lemma B.3 
again yields the lower bound 



2||a,;| 



B\\+K) re4a„.ri^.Mrfrg^(ll6|l+/0(*-)^, 

^0 ' y, ' 



<i 

,T„,.„, , 2 



Combining this with the differential inequality (B.4) for F we obtain 

,T 

u 

~\2 



(B.6) / 2 



\\u\\ 

II ( A(l|B||+AOt i\ 



+ r 

F(0) = 0. 

We arrive at the following intermediate result. 

Lemma B.4. For every e > and to > there exists some p > and R > such that 

Fi{to,m) < -p 

for all u G M*^ with \\u\\ > R and vJ a-i u > e||u|p, for all i G {1, . . . , m}. 

Proof. The differential inequality (B.6) is autonomous and smooth in Fi. Moreover, the initial 
slope satisfies 

dtFi{t,m)\t=o < — n~lT9~ ^ 
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uniformly in i and u in the designated set. Also notice the estimate 

\\u\\ R 

and the uniform bound on the last summand on the right hand side of (B.6) for t < tQ and 
||n|| > R. The claim now follows from Lemma B.3. □ 

Below we shall make use of the following is easy to check auxiliary result on Riccati equations: 

Lemma B.5. Let yl > 0, G M \ {0}, and to > 0, and Gq < 0. For t > Iq, the solution of 

dtG{t) = AG{tf + BGit), G{to) = Go 

is of the form 

^ ' [AGq + B)- AGoeS(*-*o) ■ 

IfB = 0, then 

r(f) = ^0 

l_AGoit-to)- 

From (B.4) we deduce the trivial differential inequality 

\\u\\ 

By Lemma B.5, the solution of 

dth = Ui^ii + h 
\\u\\ 

with /i(to) < is explicitly given by 



e,-. 



P|(*-*o) 

h{t) = X , t > to- 



B,, h{to) 

Together with Lemmas B.3 and B.4 and we thus obtain that 

(B.7) F,{t, iu) < ^ , t > to 

for all > R with u^OiU > e||u|p. By rescaling we infer 
fi{t,m) = \\u\\Fi{t\\u\\,'m) 



\u\\e "V 

< 



1 ''.log fMSlii -iV-V t> '» 



ai ii dt \ Bii \ J pj \\u 
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for all ||u|| > i? with u aiu > e||u|| . Integrating this inequality yields 

t f-t 
fi{s,m) ds < fi{s,m) ds 





(B. 



1 



1 



log ( \\u\\^ ( e^''(*" w) - 1^ + i ) - log I - 



log(p|Hl|^(e^''(*-^)-l)+l), 



for all > R with aiu > e||^ip. We arrive at the following key result, which completes the 
proof of Lemma B.2. 

Lemma B.6. Let i E {1, . . . , m}. For every e > and t > there exists some R > and C > 
such that 



<C(l + ||u||) 



for all u G R*^ with \\u\\ > R and vJaiU > e||u|p. Moreover, 

R^{t,m) < -uj (^J^ e'^JJ'ae'^^'ds ] uj 

for all u€W^. 

Proof. Integration of (4.2) implies 

3? (^4>{t, in) + in)'^^) < 3?(/)(t, iu) 

<-uj([ e'^Jj'ae'^-'-''ds]uj + bi I fi{s,m)ds. 



Together with (B.8), this proves the lemma. 



□ 



Appendix C. Proof of Theorem 4.3 

Fix some 7 > 0. We consider the two-dimensional process Z = {X,Y), where Yt = y + 
7 Jq Xfds, with y € It is easy to see that {X, Y) is an affine process with state space M^. In 
particular, if y = we have that = 7 Jq has an exponentially affine characteristic function 
of the form 



E [e'^^' I Xo 



x\ = e 



(j){t,iv)+ip{t,iv)x 



V G 



where the characteristic exponents (p and ip satisfy the generalized Riccati differential equations 

m = 

tp{<d) = 0. 
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For any f 7^ 0, we define the scaled functions^^ 

F{t,w) = ^LkV' ( 

1 f t ^ 

G{t,iv) = — -=^11^ — i=,'^v 



Then F and G satisfy 

dtF = a{F^-G^) + ^F + ^ reV\^\^^(cos(^\FA-l)fz{dO 
V |v| hi JO ^ ^ ' ' 

F{0) = 

(^■^) a 7; 1 I — / \ 

^tG = 2aFG + + 7^ + ^ / eVl^l^^sin ( yl^Fe ) ^{di) 

G(0) = 

We now prove a first intermediary result, which is analogous to Lemma B.4: 
Lemma C.l. There exists some to > 0, p > and R > such that 

Fito,w) < -p 

for all V with \v\ > R. 

Proof. For v — )• ±00, the solutions F{t,w) and G{t,[v) of (C.l) converge locally uniformly in t 
to the solutions Foo (t) and Goo (t) of the system 

dtF^ = a{Fl- Gl) 



i^oo(O) =0 

dtG^ = 2aF^G^ ± 1 
Goo(O) = 0. 

Since dtGoo{t)\t=o = ±1 it follows that there exists some ti > such that Goo{t) / for all 
t £ (0,ii). This again implies that Foo(io) < for some to £ (0,ti), and the lemma follows. □ 

From (C.l) we deduce the trivial differential inequality 

dF < aF"^ + -^F. 



\v\ 

Arguing as for the derivation of (B.7), we then obtain together with Lemmas B.3 and C.l that 

^(i-io) 



\v\^eVH' -1 +i 



'Note that here we have to scale by \/\v\, which is in contrast to the proof of Theorem 4.1, see (B.3). 
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for all G M with \v\ > R. By rescaling and integrating we infer, arguing as for the derivation 
of (B.8), that 



^'ip{s,w) ds < I ^ip^s^iv) ds 
log 



1 

a 



1 



a 



log p^/\v 



a / /3 h 



(3 



VmJ -1+1 



log ( - 



t > 



to 



for all f G M with \v\ > R. Similarly as in Lemma B.6 we now infer that 



<C(l + \v\ 



Combining this with Lemma B.l completes the proof of Theorem 4.3. 

Appendix D. Figures and Tables 
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Table 2. Kolmogorov-Smirnov test statistics: The table displays p- values for a 
two-sided Kolmogorov-Smirnov test applied to posterior density pvx to V^y'x '^^'^ Pyx 
using prior (7.21) for the Heston model in the left panel. The right panel displays p-values 
for the test applied to py to and Py'', the BAJD model. The prior distribution for 
this model is defined in eq. (7.20). The true posterior is defined in eq. (7.18) and the 
approximate posterior densities are defined in eq. (7.19). 
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QML 


G(4) 


CF(2) 


#Success 688 841 


982 


949 


981 



Table 3. Heston Estimation Success: The table reports the number of estimation 
successes on 1,000 datasets generated as exact draws from the Heston model using the 
technology from Broadic and Kaya (2006). Estimation success is defined by the optimizer 
meeting the termination criterion, which is a function of the norm of the gradient of the log 
likelihood function. The density approximations used are BG(4), a fourth order expansion 
using a Bilateral Gamma weight for the log stock variable and a Gamma weight for the 
variance variable, G(4), a fourth order expansion using a Gaussian weight for the log 
stock variable and a Gamma weight for the variance variable, QML denotes a Gaussian 
approximation using the true conditional moments up to order 2, and CF(2) denotes the 
second-order likelihood expansions from Ai't-Sahalia (2008). The optimizer used in the 
likelihood search is donlp2. 



(a) Bias 



^rnuE 


MLE BG{A) QML G(4) CF{2) 
Bias RMSE Bias RMSE Bias RMSE Bias RMSE Bias RMSE 


K 1 

K0V 0.04 
a 0.2 

K0X 0.03 
p -0.8 


0.2255 0.4253 
0.0093 0.0166 
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-0.0016 0.0147 - 


0.2313 0.4168 
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0.2327 0.4187 
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-0.0050 0.0624 
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(b) Estimation Noise 



^TRUE 

Qvx 


->kMLE JTRUE ^-«'j'(4) ^MLE ^*QML -^MLE c*t''(4) ^MLE -^MLE 

tivx Qvx t^vx t^vx yvx tivx i^vx yvx i^vx tivx 
Mean SD Mean SD Mean SD Mean SD Mean SD 


K 1 
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kOx 0.03 
p -0.8 
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0.0069 0.1313 
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-0.0001 0.0015 
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Table 4. Heston Asymptotic Assessment: Panel (a) displays bias and RMSE of 
the MLE estimator and the approximated MLE estimators. Panel (b) displays mean and 
standard deviation of ML estimation bias as well as the first two moments of the difference 
between the MLE estimator and the approximated MLE estimators. Computed over a 
sample of 1,000 datasets, all of which generated as exact draws from the Heston model 
using the technology from Broadic and Kaya (2006). For a given dataset only parameter 
estimates were taken into consideration where all five estimators converged. Out of 1,000 
this left 578 samples. The number of estimation successes is reported in Table 3 above. 
Approximate estimators are obtained through BG(4), a fourth order expansion using a 
Bilateral Gamma weight for the log stock variable and a Gamma weight for the variance 
variable, G(4), a fourth order expansion using a Gaussian weight for the log stock variable 
and a Gamma weight for the variance variable, QML denotes a Gaussian approximation 
using the true conditional moments up to order 2, and CF(2) denotes the second-order 
likelihood expansions from Ait-Sahalia (2008). 
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Figure 5. Posterior Densities for Heston's model: The figure displays the mar- 
ginal posterior distributions of the parameters of Heston's model conditional on the data. 
Bayesian estimation is performed using prior specification (7.21) with the true transition 
density gvx obtained through Fourier inversion, closed-form density up to second (gj^x ), 
and fourth order (g^x). 
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